SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking
نویسندگان
چکیده
We study the issue of porting a known NLP method to a language with little existing NLP resources, specifically Hebrew SVM-based chunking. We introduce two SVM-based methods – Model Tampering and Anchored Learning. These allow fine grained analysis of the learned SVM models, which provides guidance to identify errors in the training corpus, distinguish the role and interaction of lexical features and eventually construct a model with ∼10% error reduction. The resulting chunker is shown to be robust in the presence of noise in the training corpus, relies on less lexical features than was previously understood and achieves an F-measure performance of 92.2 on automatically PoS-tagged text. The SVM analysis methods also provide general insight on SVM-based chunking.
منابع مشابه
Noun Phrase Chunking in Hebrew: Influence of Lexical and Morphological Features
We present a method for Noun Phrase chunking in Hebrew. We show that the traditional definition of base-NPs as nonrecursive noun phrases does not apply in Hebrew, and propose an alternative definition of Simple NPs. We review syntactic properties of Hebrew related to noun phrases, which indicate that the task of Hebrew SimpleNP chunking is harder than base-NP chunking in English. As a confirmat...
متن کاملSVM Anchored Learning and Model Tampering for Machine Translation Evaluation
We conduct a set of experiments exploring issues of multiple kernels and kernel learning with respect to a previous work on automatic machine translation evaluation that used SVMs. By applying the techniques of model tampering and anchored learning to the learned SVM(s), we demonstrate both their practicality and utility, especially when used for error analysis and to better understand a given ...
متن کامل中文名詞組的辨識:監督式與半監督式學習法的實驗 (Chinese NP Chunking: Experiments with Supervised, and Semisupervised Learning) [In Chinese]
This paper utilizes Yamcha, a SVM tool designed by Taku Kudo, to train an NP-chunking model for Chinese. In addition to IOB and two words surrounding the focused word, we experimented on new features and exploited unlabeled data from web pages to enhance the previous model. Our experiments with supervised learning indicate that our chosen feature sets outperform those reported in previous studi...
متن کاملTagging a Hebrew Corpus: the Case of Participles
We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning...
متن کاملApplication of Machine Learning Approaches in Rainfall-Runoff Modeling (Case Study: Zayandeh_Rood Basin in Iran)
Run off resulted from rainfall is the main way of receiving water in most parts of the World. Therefore, prediction of runoff volume resulted from rainfall is getting more and more important in control, harvesting and management of surface water. In this research a number of machine learning and data mining methods including support vector machines, regression trees (CART algorithm), model tree...
متن کامل